False Sharing and Spatial Locality in Multiprocessor Caches
نویسندگان
چکیده
The performance of the data cache in shared-memory multiprocessors has been shown to be diierent from that in uniprocessors. In particular, cache miss rates in multiprocessors do not show the sharp drop typical of uniprocessors when the size of the cache block increases. The resulting high cache miss rate is a cause of concern, since it can signiicantly limit the performance of multiprocessors. Some researchers have speculated that this eeect is due to false sharing, the coherence transactions that result when diierent processors update diierent words of the same cache block in an interleaved fashion. While the analysis of six applications in this paper connrms that false sharing has a signiicant impact on the miss rate, the measurements also show that poor spatial locality among accesses to shared data has an even larger impact. To mitigate false sharing and to enhance spatial locality, we optimize the layout of shared data in cache blocks in a programmer-transparent manner. We show that this approach can reduce the number of misses on shared data by about 10% on average.
منابع مشابه
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework
The performance of applications on large shared-memory multiprocessors with coherent caches depends on the interaction between the granularity of data sharing, the size of the coherence unit, and the spatial locality exhibited by the applications, in addition to the amount of parallelism in the applications. Large coherence units are helpful in exploiting spatial locality, but worsen the effect...
متن کاملReconciling Sharing and Spatial Locality Using Adjustable Block Size Coherent Caches
Several studies have shown that the performance of coherent caches depends on the relationship between the cache block size and the granularity of sharing and locality exhibited by the program. Large cache blocks exploit processor and spatial locality, but may cause unnecessary cache invalidations due to false sharing. Small cache blocks can reduce the number of cache invalidations, but increas...
متن کاملFalse Sharing ans Spatial Locality in Multiprocessor Caches
The performance of the data cache in sharedmemory multiprocessors has been shown to be different from that in uniprocessors. In particular, cache miss rates in multiprocessors do not show the sharp drop typical of uniprocessors when the size of the cache block increases. The resulting high cache miss rate is a cause of concern, since it can significantly limit the performance of multiprocessors...
متن کاملDesign and Performance Evaluation of an Adaptive Cache Coherence Protocol
In shared-memory multiprocessor systems, the local caches which are used to tolerate the performance gap between processor and memory cause additional bus transactions to maintain the coherency of shared data. Especially, coherency misses and data traffic due to spatial locality and false sharing have a singificant effect on the system performance. In this approach, an adaptive cache coherence ...
متن کاملCompiler Optimizations for Cache Locality and Coherence
Almost every modern processor is designed with a memory hierarchy organized into several levels, each of which is smaller, faster, and more expensive than the level below. High performance requires the eeective use of the cached data, i.e. cache locality. Smart compiler transformations can relieve the programmer from hand-optimizing for the speciic machine architectures. In a multiprocessor sys...
متن کامل